Improving Approximate Value Iteration with Complex Returns by Bounding
نویسندگان
چکیده
Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the n-step returns. We propose a bounding method which uses a negatively biased but relatively low variance estimator generated from a complex return to provide a lower bound on the observed value of a traditional one-step return estimator. In addition, we develop a new Bounded FQI algorithm, which efficiently incorporates the bounding method into an AVI framework. Experiments show that our method produces more accurate value estimates than existing approaches, resulting in improved policies.
منابع مشابه
CFQI: Fitted Q-Iteration with Complex Returns
Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In ...
متن کاملApplication of variational iteration method for solving singular two point boundary value problems
In this paper, He's highly prolic variational iteration method is applied ef-fectively for showing the existence, uniqueness and solving a class of singularsecond order two point boundary value problems. The process of nding solu-tion involves generation of a sequence of appropriate and approximate iterativesolution function equally likely to converge to the exact solution of the givenproblem w...
متن کاملSolving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method
The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...
متن کاملRandomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition
Since being analyzed by Rokhlin, Szlam, and Tygert [1] and popularized by Halko, Martinsson, and Tropp [2], randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet still converges quickly for any matrix, independently of singular value gaps. After Õ(1/ ) iterations, it gives ...
متن کاملOn Approximate Stationary Radial Solutions for a Class of Boundary Value Problems Arising in Epitaxial Growth Theory
In this paper, we consider a non-self-adjoint, singular, nonlinear fourth order boundary value problem which arises in the theory of epitaxial growth. It is possible to reduce the fourth order equation to a singular boundary value problem of second order given by w''-1/r w'=w^2/(2r^2 )+1/2 λ r^2. The problem depends on the parameter λ and admits multiple solutions. Therefore, it is difficult to...
متن کامل